Objective of this notebook is to study COVID-19 outbreak with the help of some basic visualizations techniques. Comparison of countries spread of COVID-19 in the World. Perform predictions and Time Series forecasting in order to study the impact and spread of the COVID-19 in comming days.

Datset sourse: The Roche Data Science Coalition (RDSC) is requesting the collaborative effort of the AI community to fight COVID-19. This challenge presents a curated collection of datasets from 20 global sources and asks you to model solutions to key questions that were developed and evaluated by a global frontline of healthcare providers, hospitals, suppliers, and policy makers.

  1. We read the Novel-Corona-Virus-2019-dataset managed by Johns Hopkins University into this notebook. The dataset holds information about the cumulative case counts of COVID-19 Across the world. The dataset can be viewed and downloaded from - https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data

  2. The dataset CovCSD - COVID-19 Countries Statistical Dataset created by me (Available at https://www.kaggle.com/aestheteaman01/covcsd-covid19-countries-statistical-dataset) is loaded here. The information for the dataset can be seen at the description section for the dataset.

  3. COVID-19 UNCOVER Collection of Datasets available from Kaggle.

  4. US-Counties Covid-19 Dataset

What do we know at this point?

With the risk for serious disease and death from Covid-19 rising with age. So there is increasing concern for adults who have a higher risk of developing serious illness if they are infected. -Reports from https://www.statnews.com news

The datasets mentioned under this challenge takes data from Worldometer which possess the similar figures for the age-group wise distribution of COVID-19 Cases. The figures mentioned there also highlights that people associated with an already exiisting COPD's or medical ailments have a higher risk of getting into a COVID-19 infection

We are going to analyze several datasets to understand this fact much better.

Importing the Essential Libraries

In [2]:
#Data Analyses Libraries
import pandas as pd                
import numpy as np    
from urllib.request import urlopen
import json
import glob
import os

#Importing Data plotting libraries
import matplotlib.pyplot as plt     
import plotly.express as px       
import plotly.offline as py       
import seaborn as sns             
import plotly.graph_objects as go 
from plotly.subplots import make_subplots
import matplotlib.ticker as ticker
import matplotlib.animation as animation
%matplotlib inline

import plotly.express as px       
import plotly.offline as py       
import plotly.graph_objects as go 
from plotly.subplots import make_subplots

#Other Miscallaneous Libraries
import warnings
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')

from IPython.display import HTML
import matplotlib.colors as mc
import colorsys
from random import randint
import re
In [3]:
from pathlib import Path, PureWindowsPath

# Get the path of current working directory and add COVID directory path to it
path = os.getcwd()
biorxiv_dir = "C:\\Users\\Vejendla\\Desktop\\CIS-732\\novel-corona-virus-2019-dataset\\"
print(biorxiv_dir)

# Convert path to Windows format
#path_on_windows = PureWindowsPath(biorxiv_dir)
#print(path_on_windows)
filenames = os.listdir(biorxiv_dir)
print("Number of articles retrieved from biorxiv:", len(filenames))
C:\Users\Vejendla\Desktop\CIS-732\novel-corona-virus-2019-dataset\
Number of articles retrieved from biorxiv: 8
In [4]:
#Reading the cumulative cases dataset
covid_cases = pd.read_csv(r'C:\Users\Vejendla\Desktop\CIS-732\novel-corona-virus-2019-dataset\covid_19_data.csv')

#Viewing the dataset
covid_cases.head()
Out[4]:
SNo ObservationDate Province/State Country/Region Last Update Confirmed Deaths Recovered
0 1 01/22/2020 Anhui Mainland China 1/22/2020 17:00 1.0 0.0 0.0
1 2 01/22/2020 Beijing Mainland China 1/22/2020 17:00 14.0 0.0 0.0
2 3 01/22/2020 Chongqing Mainland China 1/22/2020 17:00 6.0 0.0 0.0
3 4 01/22/2020 Fujian Mainland China 1/22/2020 17:00 1.0 0.0 0.0
4 5 01/22/2020 Gansu Mainland China 1/22/2020 17:00 0.0 0.0 0.0

Further Analysis of the dataset

The following are the procedures taken into consideration.

We group the dataset Country wise Data for country for which we waana check is later fetched from the main dataset generated.

In [5]:
#Groping the same cities and countries together along with their successive dates.

country_list = covid_cases['Country/Region'].unique()

country_grouped_covid = covid_cases[0:1]

for country in country_list:
    test_data = covid_cases['Country/Region'] == country   
    test_data = covid_cases[test_data]
    country_grouped_covid = pd.concat([country_grouped_covid, test_data], axis=0)
    
country_grouped_covid.reset_index(drop=True)
country_grouped_covid.head()

#Dropping of the column Last Update
country_grouped_covid.drop('Last Update', axis=1, inplace=True)

#Replacing NaN Values in Province/State with a string "Not Reported"
country_grouped_covid['Province/State'].replace(np.nan, "Not Reported", inplace=True)

#Printing the dataset
country_grouped_covid.head()

#country_grouped_covid holds the dataset for the country
Out[5]:
SNo ObservationDate Province/State Country/Region Confirmed Deaths Recovered
0 1 01/22/2020 Anhui Mainland China 1.0 0.0 0.0
0 1 01/22/2020 Anhui Mainland China 1.0 0.0 0.0
1 2 01/22/2020 Beijing Mainland China 14.0 0.0 0.0
2 3 01/22/2020 Chongqing Mainland China 6.0 0.0 0.0
3 4 01/22/2020 Fujian Mainland China 1.0 0.0 0.0
In [10]:
#Creating a dataset to analyze the cases country wise - As of 04/26/2020

latest_data = country_grouped_covid['ObservationDate'] == '05/04/2020'
country_data = country_grouped_covid[latest_data]

#The total number of reported Countries
country_list = country_data['Country/Region'].unique()
print("The total number of countries with COVID-19 Confirmed cases as of 4th May 2020:  {}".format(country_list.size))
The total number of countries with COVID-19 Confirmed cases as of 4th May 2020:  189

Analyze the data using Choropleth and Bar plots:

* Plotting a Running Map for observing the spread of COVID-19 Confirmed Cases

In [15]:
#Creating the interactive map
py.init_notebook_mode(connected=True)

#GroupingBy the dataset for the map
DateCountry_cdf = covid_cases.groupby(['ObservationDate', 'Country/Region'])['Confirmed', 'Deaths', 'Recovered'].max()
DateCountry_cdf = DateCountry_cdf.reset_index()
DateCountry_cdf['Date'] = pd.to_datetime(DateCountry_cdf['ObservationDate'])
DateCountry_cdf['Date'] = DateCountry_cdf['Date'].dt.strftime('%m/%d/%Y')

DateCountry_cdf['log_ConfirmedCases'] = np.log(DateCountry_cdf.Confirmed + 1)
DateCountry_cdf['log_Fatalities'] = np.log(DateCountry_cdf.Deaths + 1)

#Plotting the figure
fig = px.choropleth(DateCountry_cdf, locations="Country/Region", locationmode='country names', 
                     color="log_ConfirmedCases", hover_name="Country/Region",projection="mercator",
                     animation_frame="Date",width=1000, height=1000,
                     color_continuous_scale=px.colors.sequential.Viridis,
                     title='The Spread of COVID-19 Cases Across World')

#Showing the figure
fig.update(layout_coloraxis_showscale=True)
py.offline.iplot(fig)

* Plotting a Running Map for observing the spread of COVID-19 Deaths

In [16]:
#Plotting the figure for Fatalities
fig = px.choropleth(DateCountry_cdf, locations="Country/Region", locationmode='country names', 
                     color="log_Fatalities", hover_name="Country/Region",projection="mercator",
                     animation_frame="Date",width=1000, height=1000,
                     color_continuous_scale=px.colors.sequential.Viridis,
                     title='The Deaths because of COVID-19 Cases')

#Showing the figure
fig.update(layout_coloraxis_showscale=True)
py.offline.iplot(fig)

* Analysis of Spread and deaths due to COVID-19 from Bar Graphs

In [17]:
#Plotting a bar graph for confirmed cases vs deaths due to COVID-19 in World using plotly

unique_dates = country_grouped_covid['ObservationDate'].unique()
confirmed_cases = []
recovered = []
deaths = []

for date in unique_dates:
    date_wise = country_grouped_covid['ObservationDate'] == date  
    test_data = country_grouped_covid[date_wise]
    
    confirmed_cases.append(test_data['Confirmed'].sum())
    deaths.append(test_data['Deaths'].sum())
    recovered.append(test_data['Recovered'].sum())
    
#Converting the lists to a pandas dataframe.
country_dataset = {'Date' : unique_dates, 'Confirmed' : confirmed_cases, 'Recovered' : recovered, 'Deaths' : deaths}
country_dataset = pd.DataFrame(country_dataset)

#Plotting the Graph of Cases vs Deaths Globally.
fig = go.Figure()
fig.add_trace(go.Bar(x=country_dataset['Date'],y=country_dataset['Confirmed'], name='Confirmed Cases of COVID-19', marker_color='rgb(55, 83, 109)'))
fig.add_trace(go.Bar(x=country_dataset['Date'],y=country_dataset['Deaths'],name='Total Deaths because of COVID-19',marker_color='rgb(26, 118, 255)'))

fig.update_layout(title='Confirmed Cases and Deaths from COVID-19',xaxis_tickfont_size=14,
                  yaxis=dict(title='Reported Numbers',titlefont_size=16,tickfont_size=14,),
    legend=dict(x=0,y=1.0,bgcolor='rgba(255, 255, 255, 0)',bordercolor='rgba(255, 255, 255, 0)'),barmode='group',bargap=0.15, bargroupgap=0.1)
fig.show()


fig = go.Figure()
fig.add_trace(go.Bar(x=country_dataset['Date'], y=country_dataset['Confirmed'], name='Confirmed Cases of COVID-19', marker_color='rgb(55, 83, 109)'))
fig.add_trace(go.Bar(x=country_dataset['Date'],y=country_dataset['Recovered'],name='Total Recoveries because of COVID-19',marker_color='rgb(26, 118, 255)'))

fig.update_layout(title='Confirmed Cases and Recoveries from COVID-19',xaxis_tickfont_size=14,
                  yaxis=dict(title='Reported Numbers',titlefont_size=16,tickfont_size=14,),
    legend=dict(x=0,y=1.0,bgcolor='rgba(255, 255, 255, 0)',bordercolor='rgba(255, 255, 255, 0)'),
    barmode='group',bargap=0.15, bargroupgap=0.1)
fig.show()

From the Graph of Confirmed Cases vs Deaths we observe the following trends.

On March 17th 2020, 56 Days post the first confirmed case of COVID-19. The Global Count of confirmed covid-19 cases crossed 200k mark.
Within 7 days, on 24th March 2020, the Global confirmed case count reached beyond 400k mark.
It took 3 days from March 24th 2020 to March 27th 2020, for global confirmed case count to reach 600k mark.
The same trends were observed of 3 days. On April 2, 2020 1m mark of COVID-19 was crossed.
Within the next 2 days, 200k more confirmed cases was added.
The total cumber of recovered cases was far more less than the confirmed cases. A total of 20.55% cases were recovered out of total confirmed cases as of April 6th 2020.

1. Which countries have heavily tested for COVID-19?

In [18]:
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
In [19]:
from pathlib import Path, PureWindowsPath

# Get the path of current working directory and add COVID directory path to it
path = os.getcwd()
biorxiv_dir = "C:\\Users\\Vejendla\\Desktop\\CIS-732\\uncover\\UNCOVER\\our_world_in_data\\"
print(biorxiv_dir)

# Convert path to Windows format
#path_on_windows = PureWindowsPath(biorxiv_dir)
#print(path_on_windows)
filenames = os.listdir(biorxiv_dir)
print("Number of articles retrieved from biorxiv:", len(filenames))
C:\Users\Vejendla\Desktop\CIS-732\uncover\UNCOVER\our_world_in_data\
Number of articles retrieved from biorxiv: 4
In [20]:
tests_by_country = pd.read_csv(biorxiv_dir+'total-covid-19-tests-performed-by-country.csv')
tests_per_million = pd.read_csv(biorxiv_dir+'total-covid-19-tests-performed-per-million-people.csv')
tests_vs_confirmed =  pd.read_csv(biorxiv_dir+'tests-conducted-vs-total-confirmed-cases-of-covid-19.csv')
testspermillion_vs_confirmed =  pd.read_csv(biorxiv_dir+'per-million-people-tests-conducted-vs-total-confirmed-cases-of-covid-19.csv')
tests_by_country.head(2)
Out[20]:
entity code date total_covid_19_tests
0 Armenia ARM 2020-03-18 813
1 Australia AUS 2020-03-20 113615
In [21]:
tests_per_million.head(2)
Out[21]:
entity code date total_covid_19_tests_per_million_people
0 Armenia ARM 2020-03-18 276.7
1 Australia AUS 2020-03-20 4473.4
In [22]:
tests_vs_confirmed.head(2)
Out[22]:
entity code date total_covid_19_tests total_confirmed_cases_of_covid_19_cases
0 Afghanistan AFG 2019-12-31 NaN 0.0
1 Afghanistan AFG 2020-01-01 NaN 0.0
In [23]:
testspermillion_vs_confirmed.head(2)
Out[23]:
entity code date total_covid_19_tests_per_million_people total_confirmed_cases_of_covid_19_per_million_people_cases_per_million
0 Afghanistan AFG 2019-12-31 NaN 0.0
1 Afghanistan AFG 2020-01-01 NaN 0.0
In [24]:
tests_merged = pd.merge(tests_by_country, tests_per_million, on='entity')
tests_merged = tests_merged.drop(['code_y', 'date_y'], axis = 1)
tests_merged = tests_merged.rename(columns = {'code_x': 'code','date_x':'date'}) 
tests_merged.head()
Out[24]:
entity code date total_covid_19_tests total_covid_19_tests_per_million_people
0 Armenia ARM 2020-03-18 813 276.7
1 Australia AUS 2020-03-20 113615 4473.4
2 Australia - Australian Capital Territory NaN 2020-03-20 2062 4832.3
3 Australia - New South Wales NaN 2020-03-19 39089 4832.1
4 Australia - Queensland NaN 2020-03-19 27000 5299.2
In [42]:
#Total Tests vs Entity/Country
sorted_by_tests = tests_merged.sort_values('total_covid_19_tests')
plt.figure(figsize=(30,25))
plt.barh('entity','total_covid_19_tests', data=sorted_by_tests)
plt.xlabel("total_covid_19_tests", size=15)
plt.ylabel("Salary in US Dollars", size=15)
plt.tick_params(axis='x', rotation = 90, labelsize = 18)
plt.tick_params(axis='y', labelsize = 18) 
plt.title("total covid_19 Tests vs Entity", size=50);
In [29]:
#using choropleth for better vizualization
data = dict(type = 'choropleth',
            locations = tests_merged['entity'],
            locationmode = 'country names',
            autocolorscale = False,
            colorscale = 'Rainbow',
            text= tests_merged['entity'],
            z=tests_merged['total_covid_19_tests'],
            marker = dict(line = dict(color = 'rgb(255,255,255)',width = 1)),
            colorbar = {'title':'Tests Performed','len':0.25,'lenmode':'fraction'})
layout = dict(geo = dict(scope='world'), width = 1000, height = 600)

worldmap = go.Figure(data = [data],layout = layout)
iplot(worldmap)
In [43]:
# Total Tests vs Date
sorted_by_date = tests_merged.sort_values('date')
plt.figure(figsize=(25,10))
plt.barh('date','total_covid_19_tests', data=sorted_by_date)
plt.xlabel("total_covid_19_tests", size=15)
plt.ylabel("date", size=15)
plt.tick_params(axis='x',  labelsize = 20)
plt.tick_params(axis='y', labelsize = 20) 
plt.title("total covid_19 tests vs date", size=40);
In [33]:
#Analyzing Confirmed Cases 
grouped_by_entity = tests_vs_confirmed.groupby('entity').sum()['total_confirmed_cases_of_covid_19_cases'].sort_values(ascending=False).to_frame(name = 'Sum').reset_index()
grouped_by_entity.head()
Out[33]:
entity Sum
0 World 6503477.0
1 China 3740830.0
2 Italy 644524.0
3 Iran 306009.0
4 United States 303652.0
In [45]:
#### Removing the entry for 'World' and all the 0 confirmed entries 
grouped_by_entity = grouped_by_entity[(grouped_by_entity['entity'] != 'World')]
grouped_by_entity = grouped_by_entity[(grouped_by_entity['Sum'] != 0)]
grouped_by_entity.size
Out[45]:
50
In [46]:
grouped_by_entity = grouped_by_entity.sort_values('Sum').head(25)
plt.figure(figsize=(20,10))
plt.barh('entity','Sum', data=grouped_by_entity)
plt.xlabel("total_confirmed_cases_of_covid_19_cases", size=15)
plt.ylabel("entity", size=15)
plt.tick_params(axis='x',  labelsize = 18)
plt.tick_params(axis='y', labelsize = 18) 
plt.title("total_covid_19_confirmed_cases Vs entity", size=40);

Although the confirmed cases are relatively lesser in majority of the world but the below depiction shows the spread and the skewness of the confirmed cases

In [47]:
hospitalization = pd.read_csv(r'C:\Users\Vejendla\Desktop\CIS-732\uncover\UNCOVER\ihme\2020_03_30\Hospitalization_all_locs.csv')
hospitalization.head(2)
Out[47]:
V1 location date allbed_mean allbed_lower allbed_upper ICUbed_mean ICUbed_lower ICUbed_upper InvVen_mean ... totdea_mean totdea_lower totdea_upper bedover_mean bedover_lower bedover_upper icuover_mean icuover_lower icuover_upper location_name
0 1 Wyoming 2020-02-06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Wyoming
1 2 Wyoming 2020-02-07 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Wyoming

2 rows × 31 columns

In [48]:
hospitalization.describe()
Out[48]:
V1 allbed_mean allbed_lower allbed_upper ICUbed_mean ICUbed_lower ICUbed_upper InvVen_mean InvVen_lower InvVen_upper ... newICU_upper totdea_mean totdea_lower totdea_upper bedover_mean bedover_lower bedover_upper icuover_mean icuover_lower icuover_upper
count 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 ... 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000 9955.000000
mean 91.000000 1801.707459 770.896291 3330.720107 267.905914 114.040344 496.388970 214.325205 91.210149 397.261597 ... 62.785696 1750.386488 824.401341 3031.684849 289.030028 111.622164 796.229823 88.751131 31.437686 260.768845
std 52.252026 12038.435235 5967.455359 20845.140709 1821.436570 908.885339 3141.947162 1457.161199 727.336440 2513.918406 ... 394.519131 8219.401792 3829.679661 14385.022206 3432.652042 1666.584669 6653.399333 830.787206 385.741655 1872.165876
min 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 46.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 2.000000 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 91.000000 16.402381 0.000000 47.300395 1.605850 0.000000 5.600000 1.286835 0.000000 4.512500 ... 0.000000 294.633000 150.000000 523.125000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 136.000000 593.786507 147.188783 1184.943823 82.856762 17.973125 166.179653 66.302021 14.250000 133.221266 ... 21.577344 1057.189000 441.000000 2012.687500 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 181.000000 220642.903278 118696.953473 354986.790134 34034.136641 18617.866891 53621.772922 27226.335789 14881.136021 42864.666059 ... 6732.675816 83966.963000 36613.800000 152582.075000 67062.759922 34406.925000 119823.753392 17020.169046 8124.409449 32926.901381

8 rows × 28 columns

In [49]:
newICU = hospitalization.groupby('location').sum()['newICU_mean'].to_frame(name = 'New ICU').reset_index()
newICU.head()
Out[49]:
location New ICU
0 Alabama 2311.497728
1 Alaska 327.469650
2 Arizona 3051.483415
3 Arkansas 1414.480269
4 California 10043.113517
In [53]:
newICU_by_location = newICU.sort_values('New ICU').head(25)
plt.figure(figsize=(20,10))
plt.barh('location','New ICU', data=newICU_by_location)
plt.xlabel("New ICU", size=15)
plt.ylabel("location", size=15)
plt.tick_params(axis='x',  labelsize = 18)
plt.tick_params(axis='y', labelsize = 18) 
plt.title("New ICU vs Location", size=30);
In [54]:
#Deaths vs Location
death = hospitalization.groupby('location').sum()['deaths_mean'].to_frame(name = 'Deaths').reset_index()
death.head()
Out[54]:
location Deaths
0 Alabama 1173.094
1 Alaska 148.987
2 Arizona 1575.225
3 Arkansas 729.142
4 California 5085.722
In [57]:
death_by_location = death.sort_values('Deaths').head(25)
plt.figure(figsize=(20,10))
plt.barh('location','Deaths', data=death_by_location)
plt.xlabel("Deaths", size=15)
plt.ylabel("location", size=15)
plt.tick_params(axis='x',  labelsize = 18)
plt.tick_params(axis='y', labelsize = 18) 
plt.title("Deaths vs Location", size=40);

2. Which populations have contracted COVID-19 and require ventilators ?

A mechanical ventilator is a machine that’s used to support patients with severe respiratory conditions that impact the lungs, including pneumonia.

Before a patient is placed on a ventilator, medical staff – often anaesthetists – will perform a procedure called intubation. After a patient is sedated and given a muscle relaxant, a tube is placed through the mouth and into the windpipe.

The procedure is routine but, with Covid-19 patients, medical staff need to take extreme precautions to make sure they do not become infected with the virus. The breathing tube is then attached to the ventilator and medical staff can adjust the rate that it pushes the air and oxygen into the lungs, and adjust the oxygen mix. https://www.theguardian.com/world/2020/mar/27/how-ventilators-work-and-why-they-are-so-important-in-saving-people-with-coronavirus

In [29]:
df = pd.read_csv(r'C:\Users\Vejendla\Desktop\CIS-732\uncover\UNCOVER\hifld\hifld\urgent-care-facilities.csv')
df.head(2).style.background_gradient(cmap='summer')
Out[29]:
geometry objectid id name telephone address address2 city state zip zipp4 county fips directions emergtitle emergtel emergext contdate conthow geodate geohow hsipthemes naicscode naicsdescr geolinkid x y st_vendor st_version geoprec phoneloc qc_qa ucaoa_id
0 POINT (-84.1615716333 35.8801187098) 4001 11513140 FARRAGUT WALK-IN CLINIC 865-671-6026 11408 KINGSTON PIKE nan KNOXVILLE TN 37934 3975 KNOX 47093 LOC ON THE SOUTHEAST SIDE OF KINGSTON PIKE, APPR .1 MILES SOUTHWEST OF S CAMPBELL STATION RD. nan nan nan 2009-02-11T00:00:00.000Z PHONE 2009-02-11T00:00:00.000Z MANUAL CRITICAL INFRASTRUCTURE, PDD-63; PUBLIC HEALTH; PRIMARY CARE FACILITIES (INCLUDING HOSPITALS); AMBULATORY SURGICAL FACILITIES 621493 URGENT MEDICAL CARE CENTERS AND CLINICS (EXCEPT HOSPITALS), FREESTANDING 2.46575e+07 -84.1616 35.8801 NAVTEQ 2008Q1 ONENTITY t TGS nan
1 POINT (-84.4223299591 39.1639540845) 4002 10422042 TRISTATE URGENT CARE OF OAKLEY 513-531-1505 5002 RIDGE AVENUE nan CINCINNATI OH 45209 5015 HAMILTON 39061 LOCATED ON THE NORTH WEST CORNER OF RIDGE AVE AND CALVERT ST nan nan nan 2009-01-30T00:00:00.000Z PHONE 2009-01-30T00:00:00.000Z MANUAL CRITICAL INFRASTRUCTURE, PDD-63; PUBLIC HEALTH; PRIMARY CARE FACILITIES (INCLUDING HOSPITALS); AMBULATORY SURGICAL FACILITIES 621493 URGENT MEDICAL CARE CENTERS AND CLINICS (EXCEPT HOSPITALS), FREESTANDING 3.20104e+07 -84.4223 39.164 NAVTEQ 2008Q1 ONENTITY t TGS UC_3004

How soon might a patient need a ventilator and for how long?

Once a doctor sees that a patient needs a ventilator, “it is required quickly”. “The patient can be sustained for short periods of time using manual forms of ventilation such as using a bag and mask system with oxygen, but usually being attached to a ventilator needs to happen within 30 minutes if critical.”

Story says that in severe Covid-19 patients, a life-threatening condition can develop called acute respiratory distress syndrome (Ards) that requires ventilators to deliver smaller volumes of oxygen and air, but at higher rates. This could mean a patient may need to be on a ventilator “for weeks”.

To avoid complications from the breathing tube going down the throat, a tracheostomy is carried out so the tube can go straight into the windpipe through the neck. “Patients can be more awake with tracheostomy and the hole just heals itself,”

In [30]:
df.st_vendor.value_counts()
Out[30]:
NAVTEQ    4748
TGS         62
Name: st_vendor, dtype: int64

Why a shortage of ventilators matters,and what’s being done to avoid it.

One of the most obvious ways to avoid a shortage of ventilators, is to reduce the numbers of people catching the disease in the first place. That means following all the health advice, including social distancing and hygiene rules.

In Australia, the Australian Healthcare and Hospitals Association, the Australia and New Zealand Intensive Care Society and the industry minister, Karen Andrews, have all expressed confidence that a shortage can be avoided.The Australian government is also investigating whether ventilators used on animals in veterinary clinics could be converted. Sleep apnoea machines and anaesthetic machines are also options.

Ventilators used in ambulances could be used as back up.All of that work will be crucial in saving lives if the social distancing measures and community lockdowns don’t stem the flow of patients into critical care.“Health care workers responsible for managing severe life-threatening cases like Covid-19 are extremely concerned regarding their ability to use appropriate support for large numbers of patients expected to suffer respiratory failure.

“In essence, this means that many will not be able to be treated with mechanical ventilation and difficult decisions will have to be made by staff, families and patients about the limits of support. There are many ethical dilemmas in this, and none can be easily resolved.”

In [31]:
# Select lands that fall under the "WILD FOREST" or "WILDERNESS" category Alexis mini-course
vendors = df.loc[df.st_vendor.isin(['NAVTEQ', 'TGS'])].copy()
vendors.head(2)
Out[31]:
geometry objectid id name telephone address address2 city state zip ... naicsdescr geolinkid x y st_vendor st_version geoprec phoneloc qc_qa ucaoa_id
0 POINT (-84.1615716333 35.8801187098) 4001 11513140 FARRAGUT WALK-IN CLINIC 865-671-6026 11408 KINGSTON PIKE NaN KNOXVILLE TN 37934 ... URGENT MEDICAL CARE CENTERS AND CLINICS (EXCEP... 24657452.0 -84.161572 35.880119 NAVTEQ 2008Q1 ONENTITY t TGS NaN
1 POINT (-84.4223299591 39.1639540845) 4002 10422042 TRISTATE URGENT CARE OF OAKLEY 513-531-1505 5002 RIDGE AVENUE NaN CINCINNATI OH 45209 ... URGENT MEDICAL CARE CENTERS AND CLINICS (EXCEP... 32010448.0 -84.422330 39.163954 NAVTEQ 2008Q1 ONENTITY t TGS UC_3004

2 rows × 33 columns

In [32]:
#ventillator vendors
vendors.plot()
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x1109c67d7c8>
In [33]:
plt.style.use('dark_background')
sns.jointplot(df['naicscode'],df['fips'],data=df,kind='scatter')
Out[33]:
<seaborn.axisgrid.JointGrid at 0x1109c754e88>
In [34]:
fig=plt.gcf()
fig.set_size_inches(10,7)
fig=sns.violinplot(x='naicscode',y='fips',data=df)
In [35]:
plt.style.use('dark_background')
sns.set(style="darkgrid")
fig=plt.gcf()
fig.set_size_inches(10,7)
fig = sns.swarmplot(x="naicscode", y="fips", data=df)

Covid19 patients and 3D printed valves for Ventilator devices.

Italian hospital saves Covid-19 patients lives by 3D printing valves for reanimation devices.

After the first valves were 3D printed using a filament extrusion system, on location at the hospital, more valves were later 3D printed by another local firm, Lonati SpA, using a polymer laser powder bed fusion process and a custom polyamide-based material.

The Isinnova team now developed and successfully tested a 3D printed adapter to turn a snorkeling mask into a non-invasive ventilator for COVID-19 patients. It’s an idea that anyone can 3D print using just about any type of 3D printer—and could help to address the possible shortage of hospital C-PAP masks for sub-intensive oxygen therapy, which is emerging as a concrete problem linked to the spread of COVID-19: an emergency ventilator mask, produced by adjusting a commercially available snorkeling mask.

In [36]:
plt.style.use('dark_background')
#sns.set(style="whitegrid")
fig=plt.gcf()
fig.set_size_inches(10,7)
ax = sns.violinplot(x="naicscode", y="fips", data=df, inner=None)
ax = sns.swarmplot(x="naicscode", y="fips", data=df,color="white", edgecolor="black")
In [37]:
df.plot.area(y=['naicscode','fips','zip','id'],alpha=0.4,figsize=(12, 6));
In [38]:
df.corr()
plt.figure(figsize=(10,4))
sns.heatmap(df.corr(),annot=True,cmap='YlOrRd_r')
plt.show()

New York hospitals need Ventilators and basic supplies.

According to Bill Blasio, Mayor of New York City, hospitals are running short on equipment. New York City hospitals are just 10 days from running out of “really basic supplies,” Mayor Bill de Blasio said late Sunday.

“If we don’t get the equipment, we’re literally going to lose lives,” de Blasio told CNN. De Blasio has called upon the federal government to boost the city’s quickly dwindling supply of protective equipment. The city also faces a potentially deadly dearth of VENTILATORS to treat those infected by the coronavirus. Health care workers also warned of the worsening shortages, saying they were being asked to reuse and ration disposable masks and gloves.

New York City hospitals scrambled lately to accommodate a new swell of patients, dedicating new COVID-19 wings in their facilities. It remained “extremely busy” at Northwell hospitals, a spokesman said, adding their intensive care units were filling up.

“A number of hospitals have reported that they are becoming overwhelmed,” said Jonah Allon, a spokeswoman for Brooklyn Borough President Eric Adams.

In [39]:
#heatmap with Pearson Method
corr = df.corr(method='pearson')
sns.heatmap(corr)
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x110a7a4b588>

Conclusion

There Aren’t Enough Ventilators to Cope With the Coronavirus The United States and other countries face a critical shortage of the lifesaving machines — and no easy way to lift production.

As the United States braces for an onslaught of coronavirus cases, hospitals and governments are confronting a grim reality: There are not nearly enough lifesaving ventilator machines to go around, and there is no way to solve the problem before the disease reaches full throttle.

3. Patients with travel history affected more than patients without travel history?

  • Constraints : As the above mentioned data is insufficient to draw out meaningful analyses, i couldn't analyse further on this question. I wil continue search for the right dataset.
  • To be Analysed

4. Using multiple files/datasets and analyze demographics

We use the datasets uploaded in this notebook to fetch out the important population parameters of the country.

In [58]:
#Generating a function to concatenate all of the files available.

folder_name =  "C:\\Users\\Vejendla\\Desktop\\CIS-732\\covcsd-covid19-countries-statistical-dataset"
file_type = 'csv'
seperator =','
dataframe = pd.concat([pd.read_csv(f, sep=seperator) for f in glob.glob(folder_name + "/*."+file_type)],ignore_index=True,sort=False)

#Selecting the columns that are required as is essential for the data-wrangling task
covid_data = dataframe[['Date', 'State', 'Country', 'Cumulative_cases', 'Cumulative_death',
       'Daily_cases', 'Daily_death', 'Latitude', 'Longitude', 'Temperature',
       'Min_temperature', 'Max_temperature', 'Wind_speed', 'Precipitation',
       'Fog_Presence', 'Population', 'Population Density/km', 'Median_Age',
       'Sex_Ratio', 'Age%_65+', 'Hospital Beds/1000', 'Available Beds/1000',
       'Confirmed Cases/1000', 'Lung Patients (F)', 'Lung Patients (M)',
       'Life Expectancy (M)', 'Life Expectancy (F)', 'Total_tests_conducted',
       'Out_Travels (mill.)', 'In_travels(mill.)', 'Domestic_Travels (mill.)']]
covid_data.head()
Out[58]:
Date State Country Cumulative_cases Cumulative_death Daily_cases Daily_death Latitude Longitude Temperature ... Available Beds/1000 Confirmed Cases/1000 Lung Patients (F) Lung Patients (M) Life Expectancy (M) Life Expectancy (F) Total_tests_conducted Out_Travels (mill.) In_travels(mill.) Domestic_Travels (mill.)
0 22-01-2020 NaN Afghanistan 0.0 0.0 0.0 0.0 33.0 65.0 5.89 ... 0.21 0.0 36.31 39.33 63.2 63.6 1019 1.5616 Not Reported Not Reported
1 23-01-2020 NaN Afghanistan 0.0 0.0 0.0 0.0 33.0 65.0 5.56 ... 0.21 0.0 36.31 39.33 63.2 63.6 1019 1.5616 Not Reported Not Reported
2 24-01-2020 NaN Afghanistan 0.0 0.0 0.0 0.0 33.0 65.0 4.50 ... 0.21 0.0 36.31 39.33 63.2 63.6 1019 1.5616 Not Reported Not Reported
3 25-01-2020 NaN Afghanistan 0.0 0.0 0.0 0.0 33.0 65.0 7.78 ... 0.21 0.0 36.31 39.33 63.2 63.6 1019 1.5616 Not Reported Not Reported
4 26-01-2020 NaN Afghanistan 0.0 0.0 0.0 0.0 33.0 65.0 6.00 ... 0.21 0.0 36.31 39.33 63.2 63.6 1019 1.5616 Not Reported Not Reported

5 rows × 31 columns

In [68]:
#Filtering of the dataset to view the latest contents (as of 01-04-2020)
latest_data = covid_data['Date'] == '30-03-2020'
country_data_detailed = covid_data[latest_data]

#Dropping off unecssary columns from the country_data_detailed dataset
country_data_detailed.drop(['Daily_cases','Daily_death','Latitude','Longitude'],axis=1,inplace=True)

#Viewing the dataset
country_data_detailed.head(3)
Out[68]:
Date State Country Cumulative_cases Cumulative_death Temperature Min_temperature Max_temperature Wind_speed Precipitation ... Available Beds/1000 Confirmed Cases/1000 Lung Patients (F) Lung Patients (M) Life Expectancy (M) Life Expectancy (F) Total_tests_conducted Out_Travels (mill.) In_travels(mill.) Domestic_Travels (mill.)
68 30-03-2020 NaN Afghanistan 170.0 4.0 6.17 0.28 11.61 6.1 0.0 ... 0.210 0.004367 36.31 39.33 63.2 63.6 1019 1.5616 Not Reported Not Reported
139 30-03-2020 NaN Albania 223.0 11.0 12.78 9.50 18.22 2.0 0.0 ... 0.725 0.077490 7.02 17.04 76 81.6 1526 5415 5927 Not Reported
208 30-03-2020 NaN Algeria 584.0 35.0 23.78 18.00 29.50 11.7 0.0 ... 0.475 0.013000 5.03 12.18 75.8 78.7 7560 Not Reported 2657 Not Reported

3 rows × 27 columns

In [69]:
#Replacing the text Not Reported and N/A with numpy missing value cmputation
country_data_detailed.replace('Not Reported',np.nan,inplace=True)
country_data_detailed.replace('N/A',np.nan,inplace=True)

#Viewing the dataset
country_data_detailed.head(3)
Out[69]:
Date State Country Cumulative_cases Cumulative_death Temperature Min_temperature Max_temperature Wind_speed Precipitation ... Available Beds/1000 Confirmed Cases/1000 Lung Patients (F) Lung Patients (M) Life Expectancy (M) Life Expectancy (F) Total_tests_conducted Out_Travels (mill.) In_travels(mill.) Domestic_Travels (mill.)
68 30-03-2020 NaN Afghanistan 170.0 4.0 6.17 0.28 11.61 6.1 0.0 ... 0.210 0.004367 36.31 39.33 63.2 63.6 1019 1.5616 NaN NaN
139 30-03-2020 NaN Albania 223.0 11.0 12.78 9.50 18.22 2.0 0.0 ... 0.725 0.077490 7.02 17.04 76.0 81.6 1526 5415 5927 NaN
208 30-03-2020 NaN Algeria 584.0 35.0 23.78 18.00 29.50 11.7 0.0 ... 0.475 0.013000 5.03 12.18 75.8 78.7 7560 NaN 2657 NaN

3 rows × 27 columns

In [70]:
#Converting the datatypes
country_data_detailed['Lung Patients (F)'].replace('Not reported',np.nan,inplace=True)
country_data_detailed['Lung Patients (F)'] = country_data_detailed['Lung Patients (F)'].astype("float")

Understanding the dataset generated above

The dataset holds information about:

  • The name of the country
  • Total deaths and cases reported from COVID-19 as of March 30th 2020
  • Latitude and Longitude of the country
  • Other demographics
In [71]:
#Getting the dataset to check the correlation 
corr_data = country_data_detailed.drop(['Date','State','Country','Min_temperature','Max_temperature','Out_Travels (mill.)',
                                        'In_travels(mill.)','Domestic_Travels (mill.)','Total_tests_conducted','Age%_65+'], axis=1)

#Converting the dataset to the correlation function
corr = corr_data.corr()

Which demographic factors play a role in transmission across populations?

In [72]:
def heatmap(x, y, size,color):
    fig, ax = plt.subplots(figsize=(20,3))
    
    # Mapping from column names to integer coordinates
    x_labels = corr_data.columns
    y_labels = ['Cumulative_cases', 'Cumulative_death']
    x_to_num = {p[1]:p[0] for p in enumerate(x_labels)} 
    y_to_num = {p[1]:p[0] for p in enumerate(y_labels)} 
    
    n_colors = 256 # Use 256 colors for the diverging color palette
    palette = sns.cubehelix_palette(n_colors) # Create the palette
    color_min, color_max = [-1, 1] # Range of values that will be mapped to the palette, i.e. min and max possible correlation

    def value_to_color(val):
        val_position = float((val - color_min)) / (color_max - color_min) # position of value in the input range, relative to the length of the input range
        ind = int(val_position * (n_colors - 1)) # target index in the color palette
        return palette[ind]

    
    ax.scatter(
    x=x.map(x_to_num),
    y=y.map(y_to_num),
    s=size * 1000,
    c=color.apply(value_to_color), # Vector of square color values, mapped to color palette
    marker='s'
)
    
    # Show column labels on the axes
    ax.set_xticks([x_to_num[v] for v in x_labels])
    ax.set_xticklabels(x_labels, rotation=30, horizontalalignment='right')
    ax.set_yticks([y_to_num[v] for v in y_labels])
    ax.set_yticklabels(y_labels)
    
    
    ax.set_xticks([t + 0.5 for t in ax.get_xticks()], minor=True)
    ax.set_yticks([t + 0.5 for t in ax.get_yticks()], minor=True)
    
    ax.set_xlim([-0.5, max([v for v in x_to_num.values()]) + 0.5]) 
    ax.set_ylim([-0.5, max([v for v in y_to_num.values()]) + 0.5])
    
corr = pd.melt(corr.reset_index(), id_vars='index') 
corr.columns = ['x', 'y', 'value']
heatmap(x=corr['x'],y=corr['y'],size=corr['value'].abs(),color=corr['value'])

Initial Analysis from the datasets

With a weak correlation we observe the following trends

With the rise in tempertaure, the confirmed cases tend to slow down (negative correlation). However substantial proof needs to be added here. For the sake of this in the upcoming versions of the notebook I'll analyze the trends for all the days to check the temperature.

Median age tends to affect the cases. So for a higher median age of the country cases tends to increase.

Life expectancy also seems to affect the COVID-19 confirmed cases with a weak correlation. The effect is seen more prominent in males than in females.

We keep forward to look with more dataset to analyze the correlation as since the correaltions obtained here are too weak.

Role of Temperature and Climate in spread of COVID-19?

In [73]:
#Reading the temperature data file
temperature_data = pd.read_csv(r'C:\\Users\\Vejendla\\Desktop\\CIS-732\\covcsd-covid19-countries-statistical-dataset\temperature_data.csv')
#Viewing the dataset
temperature_data.head(2)
Out[73]:
Date State Country Cumulative_cases Cumulative_death Daily_cases Daily_death Latitude Longitude Temperature Wind_speed Precipitation Fog_Presence
0 22-01-2020 NaN Afghanistan 0 0 0 0 33.0 65.0 5.89 9.4 0.0 0
1 23-01-2020 NaN Afghanistan 0 0 0 0 33.0 65.0 5.56 14.9 0.0 1
In [74]:
#Checking the dependence of Temperature on Confirmed COVID-19 Cases
unique_temp = temperature_data['Temperature'].unique()
confirmed_cases = []
deaths = []

for temp in unique_temp:
    temp_wise = temperature_data['Temperature'] == temp
    test_data = temperature_data[temp_wise]
    
    confirmed_cases.append(test_data['Daily_cases'].sum())
    deaths.append(test_data['Daily_death'].sum())
    
#Converting the lists to a pandas dataframe.
temperature_dataset = {'Temperature' : unique_temp, 'Confirmed' : confirmed_cases, 'Deaths' : deaths}
temperature_dataset = pd.DataFrame(temperature_dataset)

Analysis of Temperature and Confrimed Cases via Plotly Graphs?

In [75]:
#Plotting a scatter plot for cases vs. Temperature

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(go.Scattergl(x = temperature_dataset['Temperature'],y = temperature_dataset['Confirmed'], mode='markers',
                                  marker=dict(color=np.random.randn(10000),colorscale='Viridis',line_width=1)),secondary_y=False)
fig.add_trace(go.Box(x=temperature_dataset['Temperature']),secondary_y=True)

fig.update_layout(title='Daily Confirmed Cases (COVID-19) vs. Temperature (Celcius) : Global Figures - January 22 - March 30 2020',
                  yaxis=dict(title='Reported Numbers'),xaxis=dict(title='Temperature in Celcius'))
fig.update_yaxes(title_text="BoxPlot Range ", secondary_y=True)
fig.show()

Digging down deeper into understanding affect of temperature

We import a dataset : Weather Data for COVID-19 Data Analysis uploaded by Davin Bonin. This dataset contains information about temperature and other weather figures for the countries confirmed with COVID-19 infections. The dataset is updated till April 14th 2020

Conducting Hypothesis Testing

In [76]:
sample = temperature_dataset['Temperature'].sample(n=250)
test = temperature_dataset['Temperature']

from scipy.stats import ttest_ind

stat, p = ttest_ind(sample, test)
print('Statistics=%.3f, p=%.3f' % (stat, p))
Statistics=-0.187, p=0.851

Since we get p value > 0.05 we can safely accept our null hypothesis and can conclude, that temperature affect on COVID-19 remains same over the population data. No statistical difference is present between the two datasets and the sole effect of temperature on spread of COVID-19 can be safely rejected. However, the idea of spread of COVID-19 across a certain range of temperature needs more dataset and statistical testing to come up with a substantial conclusion.

Dependency of COVID-19 Spread on certain Health/Demographic Figures : USA

Do certain population/health demographics affects the spread of COVID-19 or is the spread completely random ? - Case Study of USA

Loading down the datasets

We load the following datasets form the UNCOVER COVID-19 Challenge datasets

  1. US Counties COVID-19 Dataset : Available on Kaggle by MyrnaMFL
  2. UNCOVER COVID-19 USAFacts Dataset : Confirmed Covid-19 Cases in US by county and state.
  3. CovCSD : Covid-19 Countires Statistical Dataset, prepared by me.
In [77]:
#Loading US County Wise Confirmed Cases Dataset
usa_cases_tot = pd.read_csv(r'C:\Users\Vejendla\Desktop\CIS-732\covcsd-covid19-countries-statistical-dataset\us-county.csv',dtype={"fips": str})

#Viewing the data
usa_cases_tot.head()
Out[77]:
fips state county Confirmed Deaths Smokers Obesity Food Environment index Exercise overcrowding Diabetics Insufficient Sleep Traffic Volume 65% Above Population Rural Population
0 1001 Alabama Autauga 19 1.0 18.081557 33.3 7.2 69.130124 1.201923 11.1 35.905406 88.457040 15.562670 42.002162
1 1003 Alabama Baldwin 78 1.0 17.489033 31.0 8.0 73.713549 1.270792 10.7 33.305868 86.997430 20.443350 42.279099
2 1005 Alabama Barbour 10 0.0 21.999985 41.7 5.6 53.166770 1.688596 17.6 38.563167 102.291762 19.420441 67.789635
3 1007 Alabama Bibb 17 0.0 19.114200 37.6 7.8 16.251364 0.255319 14.5 38.148865 29.335580 16.473214 68.352607
4 1009 Alabama Blount 15 0.0 19.208672 33.8 8.4 15.634486 1.891368 17.0 35.945010 33.411782 18.236515 89.951502
In [78]:
#Getting the geo-json files
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

#Plotting the data    
py.init_notebook_mode(connected=True)

usa_cases_tot['log_ConfirmedCases'] = np.log(usa_cases_tot.Confirmed + 1)
usa_cases_tot['fips'] = usa_cases_tot['fips'].astype(str).str.rjust(5,'0')
 
fig = px.choropleth(usa_cases_tot, geojson=counties, locations='fips', color='log_ConfirmedCases',
                           color_continuous_scale="Viridis",
                           range_color=(0, 12),
                           scope="usa")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
py.offline.iplot(fig)

Understandings from the choropleth Map Generated Above

The spread of covid-19 is seen lately around the eastern coastal side of US. New York is the major epicenter for US and counties nearby New York have higher concentration of cases than those away from it. Area around Chicago has also higher cases density than other parts of US.

Cases counts in US

New York City, Nassau, Suffolk, Westchester has the highest reported cases of COVID-19 We would further look forward with the demographic distribution of these regions to analyze the trends on much better scale.

Does spread of COVID-19 Across US Counties have any realtion with health indices?

To analyze this statement we look forward to our generated dataset

In [52]:
#Getting the dataset to check the correlation 
corr_data = usa_cases_tot.drop(['fips','state','county'], axis=1)

#Converting the dataset to the correlation function
corr = corr_data.corr()

#Plotting a heatmap

def heatmap(x, y, size,color):
    fig, ax = plt.subplots(figsize=(20,10))
    
    # Mapping from column names to integer coordinates
    x_labels = corr_data.columns
    y_labels = corr_data.columns
    x_to_num = {p[1]:p[0] for p in enumerate(x_labels)} 
    y_to_num = {p[1]:p[0] for p in enumerate(y_labels)} 
    
    n_colors = 256 # Use 256 colors for the diverging color palette
    palette = sns.cubehelix_palette(n_colors) # Create the palette
    color_min, color_max = [-1, 1] # Range of values that will be mapped to the palette, i.e. min and max possible correlation

    def value_to_color(val):
        val_position = float((val - color_min)) / (color_max - color_min) # position of value in the input range, relative to the length of the input range
        ind = int(val_position * (n_colors - 1)) # target index in the color palette
        return palette[ind]

    
    ax.scatter(
    x=x.map(x_to_num),
    y=y.map(y_to_num),
    s=size * 1000,
    c=color.apply(value_to_color), # Vector of square color values, mapped to color palette
    marker='s')
    
    # Show column labels on the axes
    ax.set_xticks([x_to_num[v] for v in x_labels])
    ax.set_xticklabels(x_labels, rotation=30, horizontalalignment='right')
    ax.set_yticks([y_to_num[v] for v in y_labels])
    ax.set_yticklabels(y_labels)
    
    
    ax.set_xticks([t + 0.5 for t in ax.get_xticks()], minor=True)
    ax.set_yticks([t + 0.5 for t in ax.get_yticks()], minor=True)
    
    ax.set_xlim([-0.5, max([v for v in x_to_num.values()]) + 0.5]) 
    ax.set_ylim([-0.5, max([v for v in y_to_num.values()]) + 0.5])
    
corr = pd.melt(corr.reset_index(), id_vars='index') 
corr.columns = ['x', 'y', 'value']
heatmap(x=corr['x'],y=corr['y'],size=corr['value'].abs(),color=corr['value'])
In [53]:
#Plotting a scatter plot for cases vs. Temperature
fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(go.Scattergl(y = usa_cases_tot['Traffic Volume'],x = usa_cases_tot['Confirmed'], mode='markers',
                                  marker=dict(color=np.random.randn(10000),colorscale='Viridis',line_width=1)),secondary_y=False)

fig.update_layout(title='Daily Confirmed Cases (COVID-19) vs. Traffic Volume : US Figures - January 22 - April 14 2020',
                  xaxis=dict(title='Reported Numbers'),yaxis=dict(title='Traffic Volume'))

fig.show()

sample = usa_cases_tot['Traffic Volume'].sample(n=250)
test = usa_cases_tot['Traffic Volume']

from scipy.stats import ttest_ind

stat, p = ttest_ind(sample, test)
print('Statistics=%.3f, p=%.3f' % (stat, p))
Statistics=0.512, p=0.608

Observations from the above Heatmap over Population Habits

None of the figures like Smokers percentage in population, obesity, diabetics tend to affect the spread of COVID-19 Infections in US in general.

A certain correaltion is observed with the number of confirmed cases in a county and the traffic congestion present for that county (as of 2020). The correaltion for the varibles are (0.613053). This might be significant as the quarantine and total isolation of people disallowing people movement across US Counties were late in comparison to countries like India/Korea/China/Japan. Hence asymptomatic cases that were carrying the virus might had spread the same, as the moment weren't restricted and the congestion of traffic for the particular counties are high.

The p-value is higher so significantly the null hypothesis can be accepted. We can't reject our null hypothesis over this case. However more reasearch is to be made to make this an evident conclusion.

Does certain age/gender groups are at a higher risk of contracting COVID-19?

Investigating the role of age/gender of population and it's realtion with the COVID-19 Virus infection rate.

  • Constraints : As the above mentioned data is insufficient to draw out meaningful analyses, i couldn't analyse further on this question. I wil continue search for the right dataset.
  • To be Analysed
In [ ]: